Deduplicating Compressed Contents in Cloud Storage Environment

نویسندگان

  • Zhichao Yan
  • Hong Jiang
  • Yujuan Tan
  • Hao Luo
چکیده

Data compression and deduplication are two common approaches to increasing storage efficiency in the cloud environment. Both users and cloud service providers have economic incentives to compress their data before storing it in the cloud. However, our analysis indicates that compressed packages of different data and differently compressed packages of the same data are usually fundamentally different from one another even when they share a large amount of redundant data. Existing data deduplication systems cannot detect redundant data among them. We propose the X-Ray Dedup approach to extract from these packages the unique metadata, such as the “checksum” and “file length” information, and use it as the compressed file’s content signature to help detect and remove file level data redundancy. X-Ray Dedup is shown by our evaluations to be capable of breaking in the boundaries of compressed packages and significantly reducing compressed packages’ size requirements, thus further optimizing storage space in the cloud.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SPORT: Sharing Proofs of Retrievability across Tenants

Proofs of Retrievability (POR) are cryptographic proofs which provide assurance to a single tenant (who creates tags using his secret material) that his files can be retrieved in their entirety. However, POR schemes completely ignore storage-efficiency concepts, such as multi-tenancy and data deduplication, which are being widely utilized by existing cloud storage providers. Namely, in deduplic...

متن کامل

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

Data Deduplication describes approach that reduces the storage capacity needed to store data or the data has to be transfer on the network. Cloud storage has received increasing attention from industry as it offers infinite storage resources that are available on demand. Source Deduplication is useful in cloud backup that saves network bandwidth and reduces network space Deduplication is the pr...

متن کامل

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

An Efficient Secret Sharing-based Storage System for Cloud-based Internet of Things

Internet of things (IoTs) is the newfound information architecture based on the internet that develops interactions between objects and services in a secure and reliable environment. As the availability of many smart devices rises, secure and scalable mass storage systems for aggregate data is required in IoTs applications. In this paper, we propose a new method for storing aggregate data in Io...

متن کامل

A Literature Review on Cloud Computing Security Issues

The use of Cloud Computing has increasedrapidly in many organization .Cloud Computing provides many benefits in terms of low cost and accessibility of data. In addition Cloud Computing was predicted to transform the computing world from using local applications and storage into centralized services provided by organization.[10] Ensuring the security of Cloud Computing is major factor in the Clo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016